Milestone 1

Question 1

The play-by-play data was downloaded from the public NHL APIs for Regular season and Playoffs. The following NHL endpoint was used to download the data:

https://statsapi.web.nhl.com/api/v1/game/[GAME_ID]/feed/live/

The GAME_ID path parameter was constructed for each game in Regular season and Playoffs from the unofficial API documentation of NHL APIs for all games in seasons from 2016-17 to 2020-21.

My image Name

The IDs maintain a specific structure which is as follows:

  1. All the IDs consist of 10 digits.
  2. First 4 digits ascertain the season in which the game was played. For example, all games played in 2018-19 season will start with 2018.
  3. The next 2 digits ascertain the type of game i.e. whether the game was played in Preseason, Regular season, Playoffs or All Star.
  4. The last 4 digits ascertain the specific game number in the season denoted by the preceding 2 digits.

For example, the following Python function was used to generate the game IDs.

def get_game_id(self, season: str, game_type: str, game_number: str):
    return f'{season}{game_type}{str(game_number).zfill(4)}'

The game_number parameter in the above function was constructed a bit differently for Regular season and Playoff games.

Regular Season

The ID was fairly straightforward for Regular season games. Here’s how it was constructed:

  • Get the season in which the game was played (e.g. 2016, 2017, 2018, 2019, 2020).
  • Get the corresponding number for the type of game (‘02’ in this case).
  • Get the game number which can range from 1 to the total number of games in a given season.

For example, the ID 2018020143 denotes the game number ‘143’ played in 2018 Regular season (Vancouver Canucks vs Arizona Coyotes).

Following Python snippet generates the game ID and obtains the corresponding data for Regular season games:

# Loop in a single hockey season e.g. 2016 (2016-17 season), 2020 (2020-21 season)
for season, games in seasons_to_game_volume_map.items():
    # Loop inside a particular game type i.e. regular or playoffs
    for game_type in game_types:
        # Check the game type. If it is regular then the last 4 digits
        # should be the game number
        if game_type == Gametype.REGULAR.name:
            for game_number in range(1, games + 1):
                game_id = self.get_game_id(season, Gametype.REGULAR.value, str(game_number))
                self.scrape_data(game_id, Gametype.REGULAR.name, loc)

Playoffs

The Playoffs consist of 4 rounds with the 1st round having 8 matchups, 2nd round having 4 matchups, 3rd round having 2 matchups and the final round having 1 matchup.

Also, each of the matchups can have 7 games, out of which games 5, 6 and 7 do not necessarily have to be played.

Here’s how the game_number parameter was constructed for Playoff games:

  • First 2 digits specify the round (i.e. ‘01’, ‘02’, ‘03’, ‘04’)
  • The 3rd digit is the matchup number. It can range from 1 to 8 for round 1, 1 to 4 for round 2, 1 to 2 for round 3 and 1 for round 4.
  • The 4th digit is the game number. For each matchup, it can range from 1 to 7.

For example, the game ID 2017030314 denotes the 4th game of the 1st matchup in the 3rd round of 2017-18 Playoffs (Tampa Bay Lightning vs Washington Capitals).

Note: The first 6 digits of the Playoffs game ID follow the same pattern as the Regular season games. The only difference is that the number for type of game is ‘03’.

Following Python snippet generates the game ID and obtains the corresponding data for Playoff games:

else:
    total_match_ups = 8
    round_num = 1

    # Continually divide total_match_ups as after each round
    # half of the teams are eliminated
    while total_match_ups != 0:
        for match_up in range(1, total_match_ups + 1):
            for game_number in range(1, 8):
                game_id = self.get_game_id(season, Gametype.PLAYOFFS.value,
                                           f'{str(round_num).zfill(2)}{match_up}{game_number}')
                self.scrape_data(game_id, Gametype.PLAYOFFS.name, loc)
        total_match_ups = total_match_ups // 2
        round_num += 1

The class scrape_nhl_data does the work of downloading all the data. Following is the code for the same:

class scrape_nhl_data:
    def get_game_id(self, season: str, game_type: str, game_number: str):
        return f'{season}{game_type}{str(game_number).zfill(4)}'

    def write_data(self, loc: str, season: str, content: Union[SupportsIndex, slice]):
        if season != Gametype.REGULAR.name and 'endDateTime' not in content['gameData']['datetime']:
            return
        with open(f'{loc}.json', 'w+', encoding='utf-8') as f:
            json.dump(content, f, ensure_ascii=False, indent=4)

    def scrape_data(self, game_id, game_type, path):
        endpoint = f'https://statsapi.web.nhl.com/api/v1/game/{game_id}/feed/live/'
        try:
            time.sleep(0.5)
            res = req.get(endpoint)
            res.raise_for_status()
            self.write_data(f'{path}/{game_id}', game_type, res.json())
        except req.exceptions.HTTPError as err:
            print(f'API failed for {game_id} with status code {err.response.status_code}')
        except Exception as e:
            print(f'{game_type} trace: {endpoint} {game_id}')
            print(e)

    def get_play_by_play_data(self,
                              path: str,
                              seasons_to_game_volume_map: dict,
                              game_types: list
                              ):
        """Created folders and individual json files for
                Arguments:
                    path (str): Location where the files should be created.
                    Ideally it should be the 'data' folder of our repository.

                    Note: Do not precede the path with a '/'. If the data
                    needs to be saved in the same directory as this script then
                    pass an empty string ''.

                    seasons_to_game_volume_map (dict of str: int): Map of seasons
                    for which the data is required and the corresponding number of
                    games in that season. For e.g. it will have the key as '2016'
                    for the 2016-17 season.

                    game_types (dict of str: str): List of game types for which
                    data needs to be retrieved.

                Return:
                    Folder containing data for each hockey season. These folders in
                    turn contain play-by-play data for regular and playoff games.
        """
        # Loop in a single hockey season e.g. 2016 (2016-17 season), 2020 (2020-21 season)
        for season, games in seasons_to_game_volume_map.items():
            # Loop inside a particular game type i.e. regular or playoffs
            for game_type in game_types:
                if len(path.strip()) == 0:
                    loc = f'{season}/{game_type}'
                else:
                    loc = f'{path}/{season}/{game_type}'

                if not os.path.exists(loc):
                    os.makedirs(loc)

                # Check the game type. If it is regular then the last 4 digits
                # should be the game number
                if game_type == Gametype.REGULAR.name:
                    for game_number in range(1, games + 1):
                        game_id = self.get_game_id(season, Gametype.REGULAR.value, str(game_number))
                        self.scrape_data(game_id, Gametype.REGULAR.name, loc)

                # Otherwise, game_type == 'playoff' and the last 4 digits
                # should be composed as follows:
                # first 2 digits -> round number (can be 01, 02, 03, 04)
                # third digit -> match up (can be upto 8, 4, 2, 1 for
                # the above mentioned round numbers)
                # fourth digit -> game number (can be from 1 to 7)
                else:
                    total_match_ups = 8
                    round_num = 1

                    # Continually divide total_match_ups as after each round
                    # half of the teams are eliminated
                    while total_match_ups != 0:
                        for match_up in range(1, total_match_ups + 1):
                            for game_number in range(1, 8):
                                game_id = self.get_game_id(season, Gametype.PLAYOFFS.value,
                                                           f'{str(round_num).zfill(2)}{match_up}{game_number}')
                                self.scrape_data(game_id, Gametype.PLAYOFFS.name, loc)
                        total_match_ups = total_match_ups // 2
                        round_num += 1

The function get_play_by_play_data needs to be called with the download path, seasons and their corresponding number of games and the game types for which the data is to be downloaded.

Here is an example of how this function can be called:

from make_dataset import Gametype, scrape_nhl_data

scraper = scrape_nhl_data()
season_data = {'2016': 1230, '2017': 1271, '2018': 1271,'2019': 1271,'2020': 868}
game_types = [Gametype.REGULAR.name, Gametype.PLAYOFFS.name]
data = scraper.get_play_by_play_data(path='', seasons_to_game_volume_map=season_data,game_types=game_types)

Question 2

…content…

IFT6758 Demo Post

This post outlines a few more things you may need to know for creating and configuring your blog posts. If you are interested in more general template features or syntax, you can visit the Introducing Lanyon or the Example Content posts.

Configurations

You should modify some of the default values in _config.yml, found in the root directory of this repo. Things like the title, tagline, description, author information, etc. are all fair game to modify. Be more careful when modifying the url information - things can break if done incorrectly (these are used if you are deploying via Github pages)

Creating Posts

To create a new post in the blog, add a new Markdown file to the _posts/ directory, with the name following the format YYYY-MM-DD-postname.md. Begin the post with the following code:

---
layout: post
title: [POST TITLE]
---

From there, write your content as you would a normal Markdown file. In general, I would recommend writing one sentence per line. This is not required, but this is far easier to work with than having a single giant line of multiple sentences for a single paragraph.

Interactive plots

Here’s how you could embed interactive figures that have been exported as HTML files. Note that we will be using plotly for this demo, but anything that allows you to HTML should work. All that’s required is for you to export your figure into HTML format, and make sure that the file exists in the _includes directory in this repository’s root directory. To embed it into any page, simply insert the following code anywhere into your page.

{% include [FIGURE_NAME].html %} 

For example, the following code can be used to generate the figure underneath it.

import pandas as pd
import plotly.express as px

df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/earthquakes-23k.csv')

fig = px.density_mapbox(df, lat='Latitude', lon='Longitude', z='Magnitude', radius=10,
                        center=dict(lat=0, lon=180), zoom=0,
                        mapbox_style="stamen-terrain")
fig.show()

fig.write_html('./_includes/plotly_demo_1.html')

The above figure is pretty cool, but you can also embed heavier/more complex figures. For brevity, the following figure is generated from the included plotly_html.ipynb notebook file in the repo’s root directory.

Introducing Lanyon

Lanyon is an unassuming Jekyll theme that places content first by tucking away navigation in a hidden drawer. It’s based on Poole, the Jekyll butler.

Built on Poole

Poole is the Jekyll Butler, serving as an upstanding and effective foundation for Jekyll themes by @mdo. Poole, and every theme built on it (like Lanyon here) includes the following:

  • Complete Jekyll setup included (layouts, config, 404, RSS feed, posts, and example page)
  • Mobile friendly design and development
  • Easily scalable text and component sizing with rem units in the CSS
  • Support for a wide gamut of HTML elements
  • Related posts (time-based, because Jekyll) below each post
  • Syntax highlighting, courtesy Pygments (the Python-based code snippet highlighter)

Lanyon features

In addition to the features of Poole, Lanyon adds the following:

  • Toggleable sliding sidebar (built with only CSS) via link in top corner
  • Sidebar includes support for textual modules and a dynamically generated navigation with active link support
  • Two orientations for content and sidebar, default (left sidebar) and reverse (right sidebar), available via <body> classes
  • Eight optional color schemes, available via <body> classes

Head to the readme to learn more.

Browser support

Lanyon is by preference a forward-thinking project. In addition to the latest versions of Chrome, Safari (mobile and desktop), and Firefox, it is only compatible with Internet Explorer 9 and above.

Download

Lanyon is developed on and hosted with GitHub. Head to the GitHub repository for downloads, bug reports, and features requests.

Thanks!

Example content

Howdy! This is an example blog post that shows several types of HTML content supported in this theme.

Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Aenean eu leo quam. Pellentesque ornare sem lacinia quam venenatis vestibulum. Sed posuere consectetur est at lobortis. Cras mattis consectetur purus sit amet fermentum.

Curabitur blandit tempus porttitor. Nullam quis risus eget urna mollis ornare vel eu leo. Nullam id dolor id nibh ultricies vehicula ut id elit.

Etiam porta sem malesuada magna mollis euismod. Cras mattis consectetur purus sit amet fermentum. Aenean lacinia bibendum nulla sed consectetur.

Inline HTML elements

HTML defines a long list of available inline tags, a complete list of which can be found on the Mozilla Developer Network.

  • To bold text, use <strong>.
  • To italicize text, use <em>.
  • Abbreviations, like HTML should use <abbr>, with an optional title attribute for the full phrase.
  • Citations, like — Mark otto, should use <cite>.
  • Deleted text should use <del> and inserted text should use <ins>.
  • Superscript text uses <sup> and subscript text uses <sub>.

Most of these elements are styled by browsers with few modifications on our part.

Heading

Vivamus sagittis lacus vel augue rutrum faucibus dolor auctor. Duis mollis, est non commodo luctus, nisi erat porttitor ligula, eget lacinia odio sem nec elit. Morbi leo risus, porta ac consectetur ac, vestibulum at eros.

Code

Cum sociis natoque penatibus et magnis dis code element montes, nascetur ridiculus mus.

// Example can be run directly in your JavaScript console


// Create a function that takes two arguments and returns the sum of those arguments

var adder = new Function("a", "b", "return a + b");

// Call the function

adder(2, 6);
// > 8

Aenean lacinia bibendum nulla sed consectetur. Etiam porta sem malesuada magna mollis euismod. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa.

Lists

Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Aenean lacinia bibendum nulla sed consectetur. Etiam porta sem malesuada magna mollis euismod. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa justo sit amet risus.

  • Praesent commodo cursus magna, vel scelerisque nisl consectetur et.
  • Donec id elit non mi porta gravida at eget metus.
  • Nulla vitae elit libero, a pharetra augue.

Donec ullamcorper nulla non metus auctor fringilla. Nulla vitae elit libero, a pharetra augue.

  1. Vestibulum id ligula porta felis euismod semper.
  2. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus.
  3. Maecenas sed diam eget risus varius blandit sit amet non magna.

Cras mattis consectetur purus sit amet fermentum. Sed posuere consectetur est at lobortis.

HyperText Markup Language (HTML)
The language used to describe and define the content of a Web page
Cascading Style Sheets (CSS)
Used to describe the appearance of Web content
JavaScript (JS)
The programming language used to build advanced Web sites and applications

Integer posuere erat a ante venenatis dapibus posuere velit aliquet. Morbi leo risus, porta ac consectetur ac, vestibulum at eros. Nullam quis risus eget urna mollis ornare vel eu leo.

Tables

Aenean lacinia bibendum nulla sed consectetur. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Name Upvotes Downvotes
Totals 21 23
Alice 10 11
Bob 4 3
Charlie 7 9

Nullam id dolor id nibh ultricies vehicula ut id elit. Sed posuere consectetur est at lobortis. Nullam quis risus eget urna mollis ornare vel eu leo.


Want to see something else added? Open an issue.

What's Jekyll?

Jekyll is a static site generator, an open-source tool for creating simple yet powerful websites of all shapes and sizes. From the project’s readme:

Jekyll is a simple, blog aware, static site generator. It takes a template directory […] and spits out a complete, static website suitable for serving with Apache or your favorite web server. This is also the engine behind GitHub Pages, which you can use to host your project’s page or blog right here from GitHub.

It’s an immensely useful tool and one we encourage you to use here with Lanyon.

Find out more by visiting the project on GitHub.